Multiresolution community detection for megascale networks 
by information-based replica correlations 
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We use a Potts model community detection algorithm to accurately and quantitatively evaluate 
the hierarchical or multiresolution structure of a graph. Our multiresolution algorithm calculates 
correlations among multiple copies ( "replicas" ) of the same graph over a range of resolutions. Signifi- 
cant multiresolution structures are identified by strongly correlated replicas. The average normalized 
mutual information, the variation of information, and other measures in principle give a quantitative 
estimate of the "best" resolutions and indicate the relative strength of the structures in the graph. 
Because the method is based on information comparisons, it can in principle be used with any com- 
munity detection model that can examine multiple resolutions. Our approach may be extended to 
other optimization problems. As a local measure, our Potts model avoids the "resolution limit" that 
affects other popular models. With this model, our community detection algorithm has an accuracy 
that ranks among the best of currently available methods. Using it, we can examine graphs over 40 
million nodes and more than one billion edges. We further report that the multiresolution variant 
of our algorithm can solve systems of at least 200 000 nodes and 10 million edges on a single proces- 
sor with exceptionally high accuracy. For typical cases, we find a super-linear scaling, 0(L 1,3 ) for 
community detection and 0(L 1 ' 3 log N) for the multiresolution algorithm where L is the number of 
edges and N is the number of nodes in the system. 
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I. INTRODUCTION 



One focus in the study of complex networks is identi- 
fying suspected internal structure, and one characteriza- 
tion of such structure is in terms of "community" divi- 
sions within a model graph. A recent introduction to the 
"physics of networks" can be found in One feature 
of organized structure within these systems is that the 
community divisions can depend on the scale at which 
the system is examined. Different scales correspond to 
distinct community divisions at different internal commu- 
nity edge densities. For many systems, including those 
with hierarchical organization, a "multiresolution" ap- 
proach Q is needed to capture the overall structure and 
the relationships between the elements at different reso- 
lutions. Examples of such systems can include biological 
processes [3, H[, food webs [f|, air transportation net- 
works [H, and communication networks p|. Thus, mul- 
tiresolution methods are an important extension of prob- 
lems in community detection. 

Some measures and methods regarding community de- 
tection are reviewed in 0, Q. Quality functions in- 
clude modularity defined by Newman and Girvan a 
Potts model orig inally proposed by Reichardt and Born- 

hoidt (rb) [la El, our Potts model that climi - 

nates the random partition applied by RB, an applica- 
tion of a Potts model utilizing a mean-field approxima- 
tion with "belief propagation" [r| , and another measure 
"fitness" 1141 . Other app roaches include clique percola- 
tion [l5l Il6| . spectral Il7ll. continuous mapping to a conic 
optimization problem |l8j , "label propagation" fta" [2(| , 
dynamical [U, [22| , and maximum likelihood [HI . Karrer 



et al. [24j defined a measure of robustness of community 
structure based on random perturbations. Some efforts 
enhance or expand applications to more general systems 
such as weighted networks [ll|, HI. 25 1, heterogeneous 



27l [28j. overlapping 



syste ms 1121 1261 . bipartite graphs 
nodes [id. Il4, lla. l28l. l29l|. and multiresolution methods. 

The multiresolution algorithm presented in this paper 
(1) determines and quantitatively evaluates the relative 
strength of multiresolution structure(s) within a graph 
by examining the correlations among several independent 
solutions ("replicas") of the same graph over a range of 
resolutions. Strong correlations in the normalized mu- 
tual information (NMI) or the variation of information 
(VI) indicate the "best" system resolutions, and the rel- 
ative value of the measure gives a quantitative estimate 
of the strength of the structures. This quantitative eval- 
uation of the best resolution(s) for the system is lacking 
or missing in most other multiscale community detection 
algorithms. (2) The method is not limited to hierarchi- 
cal structures but applies to general structures at differ- 
ent scales. (3) Our approach is based on relative infor- 
mation comparisons, so it can in principle be used with 
any community detection model that can target different 
resolutions. (4) The underlying Potts model and com- 
munity detection algorithm demonstrate an accuracy at 
least equal to the best methods currently available (see 
Appendix [A"|) [l2l ]. The model is robust to the effects of 
noise (see Appendices A and B) ; and as a local measure, it 
is free of the "resolution limit" [30] as discussed in the lit- 
erature [§, El; EL H2| ■ (5) With improvements discussed 
in Sec. IIV[ it is competitive with the best algorithms 
currently available both in terms of speed and possible 
system size. A single community solution can achieve sys- 
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tems as large as 40 million nodes and one billion edges 
with a computational time of 3.7 hours (see Appendix 
|C|) [33l ]. (6) Our multiresolution algorithm is extremely 
accurate for large systems (see Sec. IVII|) . (7) We apply 
it to megascale systems with over 10 million edges and 
200 000 nodes with a run time of about 4.6 hours on a 
single processor (33j . The algorithm should adapt very 
efficiently to parallel or distributed computing methods 
enabling larger systems to be studied. 

Hierarchical organization is the most obvious type of 
multiresolution structure. Some earlier work on hierar- 
chies in graphs can be found in 0, [H. Examples of 
more recent efforts in analyzing hierarchical structures 
in graphs are 0, H, [3, HH, Hl|- Arenas et al. [36| de- 
fined a multiresolution method using modularity that 
makes novel use of the resolution limit [30(. Reichardt 
and Bornholdt [lj|, Arenas et al. [3(|, Kumpula and co- 
workers and Heimo et al. [H[ also study multireso- 
lution applications of an RB Potts model. 

In this paper we will show, for the first time, how infor- 
mation theory based measures may be used to systemati- 
cally extract the best community partitions on all scales. 
This will enable us to methodically determine the hierar- 
chical or multiresolution structure of arbitrary networks. 
In Sec. m we first briefly review the information mea- 
sures that we employ. Then in Secs. lIIII and llVl we briefly 
discuss our Potts model and community detection algo- 
rithm, followed by an explanation of their applications to 
multiresolution analysis in Sec. [V] We then present sev- 
eral examples in Sec. lVIl The exceptional accuracy of the 
multiresolution algorithm is addressed in Sec. IVII[ and 
we conclude in Sccs. lVinl and HXl Details concerning the 
high accuracy and large size limit of the underlying com- 
munity detection algorithm arc relegated to Appendixes 
lAl and O respectively. Appendix [B] demonstrates an ex- 
ample of new transition effects in community detection 
(such transitions directly affect replica correlations). Ap- 
pendix [D] explains a generalization of our replica method 
for other, nongraph theoretical, optimization problems. 
Appendix [E] elaborates on some details related to the 
benchmark accuracy test discussed in Sec. IVII1 



II. INFORMATION THEORY MEASURES 

The normalized mutual information In and the vari- 
ation of information V provide methods of comparing 
one proposed community division to another. In order 
to define In (A, B) or V(A, B) between two partitions A 
and B, we first ascribe a Shannon entropy H(A) for an 
arbitrary community partition A. We assign the prob- 
ability that a given node will fall in community k as 
P(fc) = rik/N, where n k is the number of nodes in com- 
munity k and N is the total number of nodes in the sys- 
tem. Then the Shannon entropy is 



1A 

H{A) = ~l^ jv 1o S]v 
i=i 



(1) 



where qA is the number of communities in partition A. 

Mutual information I(A, B) was developed within in- 
formation theory. It evaluates how similar two data sets 
are in terms of information contained in both sets of data. 
The mutual information between two partitions A and B 
of a graph is calculated by defining a "confusion matrix" 
for the two community partitions. The confusion matrix 
specifies how many nodes rty of community i of partition 
A are in community j of partition B. Mutual information 
I(A, B) is defined as 



I{A,B) = £ log 
i=i 3=1 



iN 



(2) 



where is the number of nodes in community i of parti- 
tion A and rij is the number of nodes in community j of 
partition B. An interesting generalized mutual informa- 
tion is also defined in [39j . Danon et al. [Z(| suggested 
that a normalized variant (4~0 | of mutual information be 
adapted for use in evaluating similar community parti- 
tions. Using Eqs. ((T|) and ©, the normalized mutual 
information In {A, B) between partitions A and B is 



In{A,B) 



2I(A,B) 



H(A) + H(B) 



(3) 



which can take values in the range < In{A,B) < 1. 
Fred and Jain (4lj introduced, for computer vision prob- 
lems, a single resolution application of NMI that we use 
in our work. 

The variation of information [42| is a metric in the 
formal sense of the term and measures the "distance" in 
information between two partitions A and B. Using Eqs. 
(P) and j2]), V(A,B) is calculated by 



V(A, B) = H{A) + H(B) - 2I(A, B). 



(4) 



As an information distance, low values of V(A, B) indi- 
cate better agreement between partitions A and B. VI 
has a range < V(A, B) < log A. It is sufficient and 
even preferable to use the un-normalized version of VI. 
We utilize both NMI and VI to demonstrate that our 
approach is not limited to a specific measure. 

The mutual information / and Shannon entropy H also 
play a supplemental role in determining multiresolution 
structure. For the Shannon entropy H we average over 
all replicas using 



(H) 



(5) 



For In, V, and /, we calculate the average of the mea- 
sures over all pairs of replicas with 



(S) = 



r(r - 1) 



J2S(A,B) 



(G) 



A>B 



where S is any of the information measures and r is the 
number of replicas. We use base 2 logarithms in all in- 
formation calculations. 
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Similarly, higher order cumulants of S can be com- 
puted with a (replica symmetrically weighted) probabil- 
ity distribution function that we set to be 



P(S) 



r(r — 1) 



Y, S[S-S(A,B) 



(7) 



A>B 



In Eq. (J7|, S[S - S(A,B)] is the Dirac delta function. 
For any function / of S, the expectation value of / is 



(/) = / dS P(S)f(S). 



(8) 



Formally, in our probability distribution of Eq. iff]), the 
information measure S plays a role analogous to the over- 
lap parameter in spin-glass problems. 



III. POTTS MODEL HAMILTONIAN 

We briefly review our Potts model approach to com- 
munity detection 12j. Generally speaking, community 



detection algorithms based on quality functions begin a 
community evaluation by measuring the number of con- 
nected nodes within (or outside) proposed communities. 
In general, these edges can be weighted or unweighted. 
The quality function must contrast this measure to some 
"expected" value or directly evaluate missing connections 
in some manner. If a linear addition of edge weights 
(connected and unconnected) is applied, the constructed 
model is equivalent to a Potts model spin system. As it 
applies to community detection, such a model was first 
proposed by RB [ll[ , which demonstrated a clear bridge 
between community detection methods and statistical 
physics. RB implemented their model with a weighted 
comparison to a random partition (a "null" model) which 
included modularity as a special case. 

In our Potts model, we directly sum the edge weights 
(connected and unconnected) in an energy calculation 
without a weighted null model. Thus we avoid a com- 
parison to the properties of another graph, random or 
otherwise, and despite a global energy sum, we obtain an 
effectively local measure of community structure. As a 
local measure, the model is also free from the resolution 
limit discussed in the literature [3(| HH, HH . 

The general weighted Hamiltonian for our model is 



(9) 



which we refer to as an "absolute Potts model" (APM). 
Aij = 1 if nodes i and j are connected and are oth- 
erwise. Jij = (1 — A^). The values {a^} and 
are general positive weights of the connected and uncon- 
nected edges, respectively, which allow both symmetric 
and directed graphs. {A^}, {Jij}, {fty}, and {6^} are all 
fixed by the definition of the system. 7 is an externally 
defined weighting parameter for the unconnected edge 



weights. In practice, we use a symmetric matrix with in- 
teger weights (faster integer computations) on both con- 
nected and unconnected edges (7 is a rational number). 
Oi is a Potts spin variable that can take an integer value 
1 < < q- The value of <7j for a given node is the model 
equivalent of community membership. That is, node i 
is a member of community k if <7j = k. The number of 
spin states q can be specified as a constraint or can be 
determined by the lowest energy configuration over all 
values of q. The Kroneker delta 5(o~i,oj) = 1 if o~i = <jj 
and 5(<Ti,<7j) = for <7j ^ Oj. As in [2l], HH, the inter- 
action between spins is attractive if they are connected 
and repulsive if they are not connected. A further impor- 
tant feature of the Hamiltonian is that each spin interacts 
only with other spins in the same community. The opti- 
mal ground state of Eq. is often difficult to locate in 
practice, so we identify the communities of a system by 
searching for low-energy states of this Hamiltonian. 

The edge density of a particular community k is pk = 
2l/[n(n — 1)] where I is the number of edges and n is 
the number of nodes in the community. We can relate 
the model weight parameter 7 to the minimum internal 
edge density pi„ for every community. We obtain this 
relation from a simple calculation on the minimum num- 
ber of interior edges that results in an energy of zero or 
less for a single community. An alternative method is to 
calculate the minimum number of edges that will merge 
two connected communities. Then we can apply an in- 
ductive argument to establish the same inequality. For 
unweighted graphs, the relation is 



Pi 



> 



7 



7+1 



(10) 



For a weighted graph, the relation is similar, pi„ > 
7/(7 + w), where w is the average weight of connected 
edges within each community and pi n is then the edge 
weight density as compared to a maximally connected 
community with the same average weight w. These den- 
sity relations are useful because the typical internal com- 
munity edge density is equivalent to the resolution of a 
system. As a result, the resolution for the graph as a 
whole is also effectively set by 7. This property is dis- 
tinct from the resolution limit in the literature because 
the resolution set by this method is independent of a 
graph's own global parameters [13, HH, HH . 



IV. COMMUNITY DETECTION ALGORITHM 

We apply the Potts model of Eq. © with a simple 
community detection algorithm that is nevertheless ex- 
tremely accurate, at least as accurate as the best avail- 
able algorithms (sen 1 Appendix .A) when used with our 
model [12 ]. The algorithm sequentially "picks up" each 
node and places it in the community that best lowers 
the energy based on the current state of the system. We 
repeat this process for all nodes and continue iterating 
until no moves are found after one full cycle through all 
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nodes. This part of the dynamical approach is similar to 
parts of algorithms used in 0, . We can also choose to 
test the communities for possible merges that can arise 
due to local minima traps [44|. This test is more im- 
portant for heavily weighted graphs with 7 <C 1. Wc 
can optionally further allow zero-energy moves for diffi- 
cult problems. We attempt t independent optimization 
trials (generally 0(1)) and select the lowest energy con- 
figuration as the solution. Each trial permutes the order 
in which the nodes are initially traversed. Appendix [B] 
illustrates the effect of additional trials using a common 
benchmark problem with increasing levels of noise. 

The algorithm has been modified to use the intu- 
itive neighbor-node search such as in 0, 0, |4f| and 
a symmetric initial state of one node per cluster also 
in [y, [l9l . |45| and applied in a more general dynamical 
context in [21| . We further optimize the algorithm by 
allowing it to skip nodes that are already strongly de- 
fined within their respective communities. Empirically, 
we find that the neighbor search drastically improves per- 
formance for sparse graphs to 0(N 1+I3 Z 1+I3 t log Z) for 
some small (3 [4H ItH ] , where N is number of nodes and 
Z is the average node degree. The factor of log Z is due 
to a neighbor-node binary search for each connection ma- 
trix (Aij or Jij) evaluation. The factor of NZ is due to 
the iteration over all neighbors, and the factor of (3 in 
the exponent is due to the number of full NZ iterations 
which depends on the topology of the system, the initial 
state of the system, and the resolution being solved (i.e., 
the model weight 7). This scaling enables us to achieve 
systems of at least O(10 7 ) nodes and O(10 9 ) edges for a 
single application of the algorithm. Details of one of the 
large tests are discussed in Appendix [U] Wc have solved 
systems up to O(10 5 ) nodes and O(10 7 ) edges for the 
multiresolution algorithm [33[ as discussed in Sec. IVI CI 

V. MULTIRESOLUTION ALGORITHM 

One challenge in developing a multiresolution algo- 
rithm is that of selecting the best resolution(s) for the 
system. A straight-forward method that avoids the 
choice of resolution is to iteratively solve the system (with 
a necessary change in 7 for our model) and collapse the 
communities into "supcrnodes" until the system is orga- 
nized into a forced hierarchical structure. This approach 
is viable; but even when the system is hierarchical in na- 
ture, there is the question of whether the best resolutions 
were resolved at each stage. Our algorithm enables a 
quantitative analysis that determines the best resolutions 
and applies to general types of multiresolution structure. 

A. Motivation 

Ideally, we desire an algorithm that allows the sys- 
tem to communicate what the best resolutions are; but 
without a priori information, the correct weights for 



these resolutions are not obvious in general. In order to 
identify the proper resolutions, we examine information- 
based correlations among independent replicas (indepen- 
dent solutions) via NMI or VI over a range of resolutions. 
Rather than using the replicas to simply identify a unique 
optimized solution for each resolution, we examine corre- 
lations among the entire set. We then select the strongest 
correlations as the best resolutions. 

From a global perspective, the average NMI (between 
all pairs of replicas) indicates how strongly a given struc- 
ture dominates the energy landscape by measuring how 
well the replicas agree with each other. High values 
of the NMI (often manifested as peaks) correspond to 
more dominant, and thus more significant, structures. 
From a local perspective, at resolutions where the system 
has well-defined structure, a set of independent replicas 
should be highly correlated because the individual nodes 
have strongly preferred community memberships. Con- 
versely, for resolutions "in-between" two strongly defined 
configurations, one might expect that independent repli- 
cas will be less correlated due to "mixing" between com- 
peting divisions of the graph. Random effects will usually 
reduce the correlations between independent solutions. 

A similar argument applies to VI where, as an infor- 
mation distance, low values of VI correspond to better 
agreement among replicas. With these information-based 
correlations, we obtain a set of multiresolution partitions 
of the graph, but we also obtain an estimate of the rel- 
ative strength of the structures at each resolution. Note 
that this argument does not distinguish between unre- 
lated multiresolution structures or those that are strictly 
hierarchical in nature although nothing prevents the im- 
position of additional hierarchical constraints if desired. 

Implicit in this argument is the idea that local min- 
ima in the energy landscape represent meaningful, even 
if perhaps incomplete, information about the graph. The 
same assertion was made in [H, for modularity and the 
RB Potts model. Moderate levels of "confusion" caused 
by random or competing effects within a graph do not 
destroy information contained in the global energy land- 
scape, and the replica correlations of our algorithm are 
a measure of the "complexity" of that landscape. As 
the noise in the system is increased we expect that the 
transition to incoherence (where replicas are weakly cor- 
related) to occur rapidly (see end of Sec. lVIlJ and a brief 
example of an accuracy transition in Appendix [B]) . If an 
algorithm can verifiably solve for the global minima of 
a system in most cases, the problem of community de- 
tection is solved in principle. Since this is difficult to do 
in practice, the replica correlations in our algorithm take 
advantage of the fact that we cannot always locate the 
optimal ground state(s). 

In principle, one can also include in Eq. §§§ interactions 
between each of the r replicas to produce a "free energy" 
type functional of the form 

F = J2n i ({a})-Tj2s^3). (11) 

i i^j 
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where S(i,j) is an information-based measure (e.g.. In, 
V, etc.) between all replica pairs and T is a scale for 
this information measure. S(i,j) is maximized when the 
community partitions are identical in all replicas. This 
information theory measure formally plays a role analo- 
gous to entropy in a free energy functional. T then plays 
the role of a "temperature." Sans the first term, the min- 
ima of F in Eq. produce highly correlated random 
configurations (a "random high temperature configura- 
tion" of the system which appears without change in all 
replicas). Our algorithm in this work will amount to ini- 
tially minimizing the first term in F, i.e., ^2iHi({o~}), 
for a set of fixed {ji}. Out of this set of replica configu- 
rations, we then ask for which 7^ do we find a maximum 
of the correlations, S(i, j), when this information 

theory measure is plotted as a function of 7. A more so- 
phisticated version of our algorithm minimizes F directly 
with both terms included in each step. The information 
theory measures that we employ may also be written for 
other (non-graph theoretic) optimization problems with 
general Hamiltonians, or cost functions, Ti. (see Appendix 

□2). 



B. Algorithm 

We start the algorithm with a weighted or unweighted 
graph. In Eq. (|10[) . pi n is the minimum internal edge 
density for each community, and it is equivalent to the 
resolution of the system when we minimize Eq. ©. The 
algorithm uses Eq. © to solve a range of resolutions 
{Pi} = [Po,Pf] (decrementing corresponding to a par- 
ticular set of model weights {7^} = [70, 7/] as determined 
by Eq. (fTT)|) . It is almost always sufficient to have 70 < 19 
since it corresponds to a minimum community edge den- 
sity of po > 0.95. The final weight 7/ is found when the 
system is completely reduced. A completely reduced sys- 
tem is one that is fully collapsed into one community or 
one where disjoint sub-graphs will not allow the system 
to collapse any further. 

Each iteration, we decrement the density pi by a small 
value Ap = 0.05 (or 0.025 for smaller graphs) and calcu- 
late the corresponding 7,. After a threshold value (say 
p t = 0.1), we scale ^ by a factor of 1/2 (or 3/4 for smaller 
graphs) in order to take sizable steps towards a fully re- 
duced system (necessary for large systems). One could 
readily implement an adaptable step or "fill-in" process 
since the order of trials is irrelevant for the result. 

The algorithm takes three input parameters: the num- 
ber of independent replicas r that will be solved at each 
tested resolution, the number of trials per replica t, and 
the starting density which we set to be po — 0.95 corre- 
sponding to 70 = 19. The number of replicas is typically 
8 < r < 12 and is selected based upon how much aver- 
aging (over all replica pairs) is needed or desired. The 
number of trials t per replica is generally 2 < t < 20. For 
each replica, we select the lowest energy solution among 
the t trials as was discussed in Sec. IIV1 The value of t is 



chosen based on how much optimization is necessary to 
identify a strong low-energy configuration [44j. 

The r replicas (and t optimization trials) are gener- 
ated by reordering the "symmetric" initialized state of 
one node per community. That is, even though the ini- 
tialized state is symmetric, the order that we traverse 
the list also affects the answer that we obtain. This oc- 
curs because the node-level dynamics of the underlying 
community detection algorithm in Sec. IIVI moves a node 
immediately upon identifying the best community mem- 
bership given the current state of the system. Utilizing 
the r replicas, we then use the information-based mea- 
sures of Sec. [IT] to determine the multiresolution struc- 
ture. Our algorithm is given by the following steps: 

1. Initialize the system. Initialize adjacency matrices 
(Aij and J^) and weights (a^ and by) based on the 
system definition. Use Eq. (fTU)) and po to calculate 
the initial model weight 70. 

2. Solve all replicas at this resolution pi. Initialize the 
current replica to a symmetric state of one node per 
community. Use Eq. © to solve each replica with 
model weight 7* at a cost of 0(N 1+l3 Z 1+I3 t logZ) 
per replica [3, |46[ . Repeat the process indepen- 
dently for all r replicas. Each trial and replica 
randomly permutes the order in which nodes are 
initially traversed in the respective solutions. 

3. Calculate the replica In, V, I, and H information 
measures. Use Eq. ([T]) to calculate H for all replicas 
and Eqs. ©-((I]) to calculate /, In, and V between 
all pairs of replicas for this resolution pi [48| . Cal- 
culate the average (see Eqs. ((5]) and ([6])) and the 
standard deviation for each measure. 

4. Decrement to the next resolution Pi+\. If pi > 0.1, 
decrement Pi+\ = Pi — 0.05 or 0.025 for smaller 
graphs. If pi < 0.1, p l+1 = p t /2 or 3p,:/4 for 
smaller graphs. Calculate the model weight 7i+i 
by Eq. (fT0|) . Return to Step [2] until the system 
is not further reducible (fully collapsed or disjoint 
sub-graphs will not collapse). 

5. Evaluate results. For the range of model weights 

plot each average In,{, Vi, I{, and Hi ver- 
sus 7j. Determine the strongest correlations (In 
high or V low) in these plots (see Figs. [2] - 21 HI 
[8j and [TO]) . These strongly correlated regions cor- 
respond to the best multiresolution structure(s) in 
the graph. If the correlation is less than "perfect" 
(In < 1 and V > 0), we choose the lowest en- 
ergy replica to be the partition solution. One could 
also choose to construct a "consensus" partition be- 
tween all of the replicas [l^, |4l[ at each notable 
resolution. 

We estimate that the number of resolutions {p.i} re- 
quired to adequately specify an arbitrary system scales 
as 0(\ogN). The dominant scaling of the algorithm 
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FIG. 1: (Color online) Heterogeneous hierarchical systems 
corresponding to the plots in Fig. [2] for panel (a) and the 
plots in Fig.Ufor panel (b). In panel (a), the 256 node system 
is divided into a three-level hierarchy where the unweighted 
edge connection probabilities at each level are the following: 
level 3 has p3 = 0.9 between nodes in the same commu- 
nity with community sizes from 5 to 22 nodes (average 16). 
Level 2 has p2 — 0.3 between nodes in different constituent 
sub-communities with merged community sizes from 33 to 76 
nodes. Level 1 is the completely merged system of 256 nodes 
with pi =0.1 between nodes in different sub-communities. 
The average edge density is p = p 1 = 0.182. In panel (b), we 
increase the system size to 200 000 nodes. Level 3 has 10 000 
communities with sizes from 6 to 37 nodes (average 20). Level 
2 has 2500 communities with sizes from 27 to 180 nodes which 
are formed by merging two to eight communities from level 
3. The density pi is changed from panel (a) to pi = 0.00031, 
and the average edge density is p = p 1 ~ 0.0005. This larger 
system has over ten million edges with approximately 62% of 
the edges being random noise between level 2 communities. 
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FIG. 2: (Color online) Plot of information measures In, V, 
H, and I in panels (a) and (b) vs. the Potts model weight 7 
in Eq. ([9} for the three-level heterogeneous hierarchy depicted 
in Fig. QJa). In panel (a), the squares represent the average 
replica normalized mutual information In (left axis) , and the 
inverted triangles represent the average mutual information 
I (right axis). In panel (b), the triangles represent the aver- 
age variation of information V (left axis), and the diamonds 
represent the average Shannon entropy H (right axis). For 
comparison, the circles in both panels (a) and (b) represent 
the average number of clusters q for the same set of replicas 
(right-offset axes). In panel (a) the peak In values (ia) and 
(tia) both accurately correspond to levels 2 and 3 respectively 
of the hierarchy depicted in Fig. QJa). Similarly in panel (b) 
the minimum V values (ib) and (iib) also accurately corre- 
spond to levels 2 and 3, respectively, of the hierarchy. In 
panels (a) and (b), both the mutual information / and Shan- 
non entropy H display a "plateau" behavior corresponding 
to the correct solutions. Plateaus in the average number of 
clusters q [5lJ also indicate important structures as in [36l ]. 



is almost always Step [5J so we estimate that the over- 
all scaling is 0(N 1+ ° Z 1+ @rt log N log Z) for some small 

p EEII3- 

Structures identified by this algorithm are not neces- 
sarily hierarchical; however, one can augment the algo- 
rithm by imposing an additional hierarchical constraint 
on some fraction of the replicas. Comparisons would then 
be made strictly between all pairs with and without this 
additional constraint. Wc applied this variation in both 
divisive and agglomerative approaches, but in our test- 
ing it only resulted in a modest improvement to the algo- 
rithm's ability to identify the best resolutions. Therefore, 
we use the above algorithm in order to take advantage of 
its generality and relative simplicity. 



VI. EXAMPLES 

We show the results of the multiresolution algorithm 
of Sec. W\ applied to several test cases [Isf. In Sees. IVI Al 
and IVI C[ wc illustrate a small 256 node and a larger 
200 000 node hierarchy respectively with both systems 
depicted in Fig. [1] In Sec. IVI Bl we examine the struc- 
ture of an Erdos-Rcnyi random graph for comparison to 
graphs with known internal structure. We then analyze 
two real social networks in Sees. IVI Dl and IVI El where 
the respective systems are depicted in Figs. [5] and [7] In 
Sec. lVIII we also demonstrate the algorithm's exceptional 
accuracy for large systems. 
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A. 256 node hierarchy 

The system in Fig. QJa) depicts a set of 256 nodes for 
a constructed three-level heterogeneously-sized hierarchy. 
The results are seen in Fig. [2l The unweighted edge con- 
nection probabilities are pk for k = 1, 2, 3. Level 3 has a 
density P3 = 0.9 between nodes in the same community 
with community sizes from 5 to 22 (average 16) nodes. 
Level 2 has a density p2 = 0.3 between nodes in differ- 
ent constituent sub-communities and is divided into five 
groups with merged sizes from 33 to 76 nodes. Level 
1 is the completely merged system that has a density 
Pi = 0.1 between nodes in different sub-communities. 
These edges provide some system noise. The average den- 
sities of communities at levels 1 and 2 are p = p x = 0.182 
and p 2 = 0.470. We use eight replicas and four trials per 
replica at a total run time of 6.1 s (33j |. 

In Fig. [2(a), the squares represent NMI averages over 
all replica pairs (left axis). The inverted triangles rep- 
resent the mutual information / averages for the same 
replica pairs (right axis). In Fig. [2(b), the triangles rep- 
resent VI averages over all replica pairs (left axis), and 
the diamonds represent the Shannon entropy H averages 
for the replicas (right axis). In both panels, the circles 
represent the average number of clusters across the repli- 
cas (right offset axes). All parameters are plotted versus 
the model weight 7 where we use a logarithmic scale to 
facilitate comparing the behavior of a large range of sys- 
tem sizes from TV = 16 nodes in Figs. [7] and [5] to as large 
as N = 200 000 nodes in Figs. [1(b) andH|(f. 

The extrema (za,b) and (iia,b) are the correctly deter- 
mined levels 2 and 3 respectively of the test hierarchy 
depicted in Fig. [1(a). Peaks (ia) and (na) have In = 1 
and minima (ib) and (iib) have V — which indicate 
perfect correlations among the replicas for both levels of 
the hierarchy. The "plateaus" in H and I are a second 
indication of the significant system structure whose im- 
portance will become more ap par ent in later examples. 
The plateau in the average q [5 11 ] is also an important 
indicator of system structure as used in 36]. However, 
Figs. [31 [H and [5] discussed later demonstrate that some 
caution should be exercised when using the plateau cri- 
terion (in H, I, or q) for determining multiresolution 
structure. 

At level 3 in Fig. [Ha), the average number of exter- 
nally connected edges for each node is Z out ~ 32.0 with a 
random noise component of Z™°l se ~ 19.8. Both of these 
numbers are larger than the average number of internal 
edges, Zi n ~ 14.3. Despite this imbalance, the algorithm 
easily identifies level 3 of the hierarchy because the exter- 
nal edges (particularly those due to the random noise) are 
not concentrated strongly enough into any one external 
cluster. This behavior is important for smaller commu- 
nities on level 3 where Z out is substantially larger than 
Zi n , and it illustrates that the model is robust to noise 
in the system. 

The VI peaks at 71 = 0.111 and 72 = 0.435 in Fig. [2(b) 
correspond to the average inter-community edge densi- 



ties, pi = 0.1 for sub-communities at level 2 and p 2 = 0.3 
for sub-communities at level 3. Equation (JTDJ) relates the 
minimum internal edge density pi n > 7/(7 + 1) for each 
community in a solved partition. We can arrive at this 
inequality, using inductive reasoning, by considering the 
minimum inter-community edge density required for two 
arbitrary communities A and B to merge. We apply the 
relation as an equality (i.e., energy difference between 
the merged and unmerged states is approximately zero) 
for the peak VI values at 71 and 72. The respective den- 
sities are pf B = 0.100 and pf B = 0.303. These values 
correspond closely to the constructed inter-community 
densities p\ and p 2 above. The local VI maxima show 
that "complexity" of the energy landscape increases at 
resolutions where 7/(7+1) is equal to the mean inter- 
community edge density. The more intuitive interpreta- 
tion is that the "complexity" of the energy landscape in- 
creases substantially when the energy difference between 
different states is approximately zero. 



B. Erdos-Renyi random graph 

In Fig. [31 for comparison purposes we show the results 
for a purely (Erdos-Renyi) random graph at the same av- 
erage edge density p = 0.182 as the hierarchy in Figs. [1(a) 
and [21 We use eight replicas and four trials per replica 
at a total run time of about 6.9 sec [33|. The only peak 
(ia) in the random graph corresponds to a trivial divi- 
sion into groups with sizes of approximately {1,2,253} 
among the various replica solutions. This peak indicates 
transitional behavior to lower density, essentially trivial, 
structures. Peaks such as (i) can be distinguished from 
more meaningful ones by the cluster size distribution or 
the corresponding information measures. The value of I 
at (£a) or V and H at (ib) all have very low informa- 
tion values. Otherwise, the random graph displays no 
significant multiresolution structure. 

All of the information measures display a plateau be- 
havior at (ii&.b). The plateaus in NMI or VI do not in- 
dicate a clear multiresolution structure because the cor- 
relations are relatively poor (ijv — 0.70 and V ~ 3.6) 
for both measures. If we examine the detailed solutions 
across the plateaus (separate from our multiresolution 
algorithm), the average NMI and VI are Ijy = 0.644 
and V = 4.04 both of which indicate poor agreement. 
There is no consistent structure identified by the com- 
munity detection algorithm in this region. Instead, the 
weak plateaus in NMI and VI indicate that the system is 
constrained within a set of similarly sized partitions that 
have similarly high community edge densities. This ex- 
ample also illustrates that if we use only the plateaus (in 
H , /, or q), there is a potential to incorrectly identify sig- 
nificant structure(s) in the system. This possibility can 
be remedied by information checks on nearby solutions 
in the plateau, but the poor NMI and VI correlations al- 
ready appear to indicate the lack of consistent structure 
in the region. 
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FIG. 3: (Color online) Plot of information measures In, V, 
H , and / in panels (a) and (b) vs the Potts model weight 7 
for a purely (Erdos-Renyi) random graph that has the same 
average density p — 0.182 as the hierarchy in Fig.[]Ja) and the 
corresponding results in Fig. [2] The right-offset axes plot the 
number of clusters q. See Fig. [2]for a complete description of 
the legends and axes. In panel (a), the peak (ia) corresponds 
to a trivial partition of the system into groups with sizes of 
approximately {1,2,253} among the different replicas. The 
trivial structure change in the NMI spike is indicated by its 
the low value of mutual information I at (ia) and by its low VI 
V and Shannon entropy H at (ib). The plateaus at (iia,b) 
do not correspond to a consistent multiresolution structure 
as evidenced by the poor NMI and VI correlations. Rather, 
they indicate multiple similarly sized configurations that have 
similar community edge densities. 



C. Large hierarchy 

A much larger hierarchy is depicted in Fig. [Tf b) . The 
system has 200 000 nodes and 10 011 428 edges. Ap- 
proximately 62% of these edges are due to random noise 
between level 2 communities. For this system, p\ = 
0.000 31, but P2 = 0.3 and p$ = 0.9 are unchanged from 
Fig. nja). There are 10 000 sub-communities at level 3 
with sizes ranging from 6 to 37. Level 3 communities are 
combined in groups of two to eight to form the 2500 com- 
munities of level 2 with sizes ranging from 27 to 180. We 
use eight replicas and two trials per replica with a run 
time of about 4.6 hours [33|. In Fig. [H extrema (ia,b) 
exactly identify level 2 of the hierarchy with perfect NMI 
and VI correlations, and extrema (iia,b) accurately iden- 
tify (I N = 0.999 995 and V = 1.42 x 10~ 4 ) all but 5 



FIG. 4: (Color online) Plot of information measures In, V, 
H, and I in panels (a) and (b) vs. the Potts model weight 7 
for the large three-level heterogeneous hierarchy depicted in 
Fig. [Jib) . The right-offset axes plot the number of clusters 
q. See Fig. [2] for a complete description of the legends and 
axes. With the exception of 15 weakly connected nodes (out 
of 200 000) and 5 merged clusters (out of 10 000) at (ua,b), 
the extremal values of 7jv and V at (ia,b) and (na,b) both 
accurately correspond to levels 2 and 3 respectively of the 
hierarchy depicted in Fig. [TJb) . 



merged clusters out of 10000 and 15 nodes out of 200000 
nodes for level 3. Due to random fluctuations, all of 
these nodes have a random connectedness of 50% or less 
for their intended communities. This result is therefore 
consistent with the model and algorithm. 



D. Dolphin social network 

We tested a social network of 62 bottlenose dolphins 
in Doubtful Sound, New Zealand [H, [H, Three of 
the strongest partitions ((i), (iv), and (v)) are depicted 
in Fig. [5] using the results in Fig. [6] We use ten replicas 
with ten trials per replica at a total run time of about 0.78 
sec [1^. We use a density scaling of 0.8 rather than 0.75 
for pi < 0.1 for Step [J] of the algorithm in order to more 
easily observe the transition between structures (i) and 
(ii) in Fig. [SJ Configuration (i) identifies a grouping of 
21 and 41 dolphins with perfect NMI and VI correlations 
(In = 1 and V = 0). This configuration agrees with an 
observed split of the dolphin network when a dolphin left 
the school [52[ , but our algorithm also suggests that this 
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FIG. 5: (Color online) Pictorial representation of a social 
network of 62 bottlenose dolphins in Doubtful Sound, New 
Zealand [52|, l53l . |54| [. These groupings correspond to struc- 
tures (i), (iv), and (v) in Fig.[S]in order of smaller group sizes. 
The two-cluster partitionfi) corresponds to a known split of 
the dolphin community [52|]. In partition (iv), sub-groups are 
assigned distinct node shapes except for circles which indi- 
cate various one and two member groups. Structure (v) is 
identified from configuration (iv) when the four highlighted 
dyads of dolphins ({5,56}, {15,55}, {20,28}, and {40,52}) 
form distinct sub-groups. Note that sub-groups {7, 19, 30} 
and {23, 36, 39} in (iv) have nodes that are separated in their 
respective super-groups. These groups are examples of how 
our algorithm does not restrict node assignments between dif- 
ferent resolutions, and they illustrate how the algorithm can 
apply to general types of multiresolution structure. 
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FIG. 6: (Color online) Plot of information measures In, V, H, 
and / in panels (a) and (b) vs. the Potts model weight 7 for 
a social network of 62 bottlenose dolphins in Doubtful Sound, 
New Zealand [52], l53l , [54| . A summary of results is depicted 
in Fig. [5]for configurations (i), (iv), and (v). The right-offset 
axes plot the number of clusters q. See Fig. [2] for a complete 
description of the legends and axes. One notable grouping is 
configuration (i) which corresponds to a known split of the 
dolphin community [52| . The structures represented by (ii) 
- (11) are other potential strongly defined partitions and are 
explained in the text. 



configuration is not the only strongly defined partition 
for the system. 

Our algorithm further identifies partitions (ii) - (v) as 
important candidate partitions based on the strong NMI 
and VI information correlations. Partition (ii) separates 
weakly connected dolphins ({4}, {11}, {12}, {35}, {58}, 
and {46, 59}) in the larger super-group of Fig. [5] into 
distinct sub-groups. Configuration (Hi) is slightly less 
well-defined with information correlations of In ~ 0.980 
and V ~ 0.132. It separates weakly connected dolphins 
({22}, {31}, {39}, {48}, and {32,60}) of the smaller 
super-group of partition (i) and also begins a coarse di- 
vision of the larger super-group. Configuration (iv) is 
perfectly correlated and is the first major reconfigura- 
tion of both super-groups of structure (i). The data in 
the three largest groups of (iv) are largely divided along 
gender lines according to details presented in [53j . Con- 
figuration (v) is a slight variation of (iv) with In ~ 0.998 
and V ~ 0.0178 which separates four dyads of dolphins 
({15,55}, {46,49}, {32,60}, and {20,28}) into distinct 
groups. Among different tests, there is some variation in 



the predicted groupings where a few nodes can be reas- 
signed between groups or separated into distinct commu- 
nities. Sub-groups {7,19,30} and {23,36,39} of config- 
uration (iv) have nodes that are split between the two 
super-groups of (i). These groups show that our algo- 
rithm does not restrict node assignments between differ- 
ent resolutions. This behavior allows our algorithm to 
solve general types of multiresolution structures. 

All measures show a strong plateau for configuration 
(ia,b). The mutual information / shows weak plateaus 
at (iia) and (iva) but no plateau at (iiia) and (va) . Sim- 
ilarly, the Shannon entropy H shows weak plateaus at 
(iib) and (vb) but no plateau for (iiib) and (ivb). The 
average number of clusters q as used in [36| also indi- 
cates the presence of structures (ii) and (v), but it misses 
partition (iv). Additionally, a weak plateau in q near 
configuration (Hi) predicts a slightly different resolution 
than the extremal NMI and VI correlations. The weak 
plateau behavior of H, I, or q at different configurations 
of (iia,b) - (va,b) do not contradict the existence of valid 
structures. Rather, missing plateaus in the supplemental 
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FIG. 7: (Color online) Pictorial re pres entation of 16 Polopa 
tribes of Highland New Guinea [55l. |56(| . Solid lines represent 
allied relationships, and gray dashed lines represent antago- 
nistic relationships. The three main levels of the structure 
are indicated by shaded areas. These groupings of tribes cor- 
respond to structures (i), (ii), and (iv) in Fig. [5] in order 
of smaller group sizes. Distinct node shapes (intermediate 
grouping) also correspond to structure (ii). The three-cluster 
structure (ii) corresponds exactly to the analysis in [HEI. l56l]. 
Structure (Hi) in Fig.[S]is formed when node 2 joins the group 
at the bottom-right of the figure. 



measures H , /, or q can indicate a noisy graph in general 
or a strongly defined but transient resolution. 



E. Highland Polopa tribe relations 

Figures [7J and [5] show the results for 16 Polopa tribes of 
Highland New Guinea [H, HH . These data feature allied, 
neutral, and antagonistic relations between the sub-tribes 
of the region. Hage and Harary (5(| used symmetric edge 
weights of +1 for allied relations, for neutral relations, 
and —1 for antagonistic relations in their analysis; but 
these "intuitive" weight assignments are inconsistent if 
extended to systems that include few or no antagonis- 
tic relations (such systems would tend to "collapse" into 
large groups). Therefore, our model uses the more con- 
sistent assignments of —1 for "neutral" relations and —2 
for antagonistic relations. Interestingly, Hage and Harary 
[HI related the fact that the sub-tribes did not consider 
the possibility of strictly neutral relations among tribes. 
We use 12 replicas with 10 trials per replica to limit fluc- 
tuations in this very small data set at a total run time of 
about 0.46 sec [Hj]. We use an array data structure due 
to the missing edge weights. 

Figure [7] depicts configurations (i), (ii), and (iv) from 
Fig. [S] in order of smaller group sizes. For presenta- 
tion purposes, we allow three additional resolutions to 
be solved after the algorithm detects disjoint subgraphs 
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FIG. 8: (Color online) Plot of information measures In, V, 
H, and I in panels (a) and (b) vs. the Potts model weight 7 
for 16 Polopa tribes of Highland New Guinea. The results are 
summarized in Fig. [7] The right-offset axes plot the number 
of clusters q. See Fig. [2] for a complete description of the 
legends and axes. The most important structure represented 
in the figure is at (ua,b) where the strong correlations agree 
exactly with data and analysis presented in [55l. [56l|. See the 
text for a full discussion of the other structures indicated in 
the figure. 



at (ia,,h). Our three-cluster partition (ii) agrees exactly 
with those discussed in [56[ . All configurations indicated 
in Fig. [5] are strongly defined with 7jv = 1 and V = 0. 
The first configuration (i) is a two-cluster solution which 
merges two sets of clusters of configuration (ii). The 
small size of the system causes the transition between 
configurations (i) and (ii) to be sharply defined. To re- 
solve the ambiguity, we must reference the plateaus in the 
information measures H or I (or the number of clusters 
q [1]). 

Strong NMI and VI values at (ma,b) and (iva,b) corre- 
spond to two five-cluster solutions. These solutions sub- 
divide the three-cluster system into two slightly differ- 
ent dense configurations of allied tribes. In configuration 
(Hi) , node 2 is associated with the group on the bottom- 
right of Fig. [7] In configuration (iv), all groups are cliques 
(maximally connected sub-graphs). Both NMI and VI 
detect the transition between (Hi) and (iv) with a short- 
lived spike. The information measures H and / also show 
the transition with plateaus at different values. Here, the 
number of clusters q does not detect the transition since 
q does not actually change. Again, this is due to the lim- 
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FIG. 9: (Color online) A sample graph with N — 1000 nodes 
from the new benchmark proposed in [S3]. For presentation 
purposes, this depiction uses fi = 0.05. Other parameters are 
a = 2, = 1, (k) — 15, and k max = 50 (see text). 



ited variability in this system, but the same ambiguity 
occurs in Fig. [3] for all three supplemental measures H, 
I, and q. 



VII. ACCURACY 

In Figs. [9] -[Til we test the accuracy of the multircsolu- 
tion algorithm of Sec. [V] with a recently proposed bench- 
mark in [S3. An example graph with N = 1000 nodes 
is depicted in Fig. [5J This new benchmark can pose a 
significant challenge since it incorporates a more realistic 
heterogeneous distribution of community sizes and node 
degrees, and it allows for testing across a large range of 
system sizes. It divides a set of N nodes into q commu- 
nities with sizes assigned according to a power-law dis- 
tribution with an exponent 0. The community sizes are 
optionally constrained by minimum and maximum sizes 
of n m i n and n rnax . The degrees of the nodes are also 
assigned in a power-law distribution with an exponent a 
with constraints specified by the maximum degree k max 
and the mean degree (k). The minimum degree k m i n is 
set so that the distribution gives the correct mean (k). A 
fraction (1 — fi) of the edges of each node are connected 
to nodes within their own communities. The remaining 
fraction \x are assigned to nodes in other communities. 

We test systems with N = 1000 and 5000 nodes and 
power-law exponents of a — 2 and 3 for the degree dis- 
tribution and = 1 and 2 for the community size distri- 
bution. We do not specify the optional community size 
constraints n m i n or n max allowing the benchmark pro- 
gram to specify them by the degree distribution. The 
node degree distribution is specified by (k) = 15 and 
kmax = 50 where the mean degree (k) = 15 was the most 
difficult of the tested values in (5?J ■ We vary the mixing 
parameter \x in the range 0.1 < fi < 0.7. The accuracy 
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FIG. 10: (Color online) Plot of information measures In, V, 
H , and 7 in panels (a) and (b) vs. the Potts model weight 
7 for a single realization of the benchmark suggested in [571 ]. 
The right-offset axes plot the number of clusters q. See Fig. [2] 
for a complete description of the legends and axes. Figure [9] 
depicts a sample system from the benchmark (with a differ- 
ent mixing parameter /1) showing a distribution of community 
sizes. This example plot is for N = 1000 at /x = 0.5 where 50% 
of each node's edges on average are connected to communities 
other than its own. We use a = 2 and = 1 for the power-law 
distribution exponents of the node degrees and the commu- 
nity sizes respectively. Using the algorithm in Sec. [V] we 
identify the strongest NMI and VI replica correlations among 
the different resolutions as the "best" answer for the graph. 
For this graph at [i = 0.5, there is only one extremal value of 
In and V which indicates that there is only one "best" resolu- 
tion for the defined system (see also Appendix [E)l . Note that 
these information values are the averages among the replicas. 
The full accuracy plot in Fig. [TTJplots the average 7jv between 
the "ftesf partitions and the known benchmark graphs for a 
range of the mixing parameter [i. 



results are summarized in Fig. 1111 

We apply the multiresolution algorithm of Sec. [V] to 
identify the "best" system partition. Figure ITOl shows an 
application of the algorithm for a single benchmark graph 
with N = 1000, fx = 0.5, a = 2, and = 1. In this plot, 
we identify the "best" system resolution by the strongest 
average NMI correlation between all pairs of replicas. We 
use r = 8 replicas with t = 4 energy optimization trials 
per replica. As seen in Fig.QIJ both Jjy and V (almost al- 
ways) show only one extremal value which is the strongly 
defined system at (ia,b). Plateaus in H, I, and q qual- 
itatively confirm the structure indicated by the extrema 
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FIG. 11: (Color online) A plot of In vs. fi for a new bench- 
mark problem proposed in In is calculated between 
the solved answer, by means the multiresolution algorithm in 
Sec. IVl using the absolute Potts model of Eq. @, and the 
constructed benchmark graphs. An example multiresolution 
analysis for one generated graph is in Fig. [TO] fi is the fraction 
of edges of each node (on average) that are assigned outside 
its own community. We tested the power-law distribution ex- 
ponents a = 2 and 3 and f) = 1 and 2 for the node degrees 
and the community sizes, respectively. For comparison, we 
also plot the results from (53 determined by modularity opti- 
mization (Q-opt) using simulated annealing. With the APM, 
our multiresolution algorithm demonstrates extremely high 
accuracy for large systems (see text). Appendix lEl discusses 
the accuracy perturbations in panels (a) and (b) for N = 5000 
nodes. Data for TV = 1000 and TV = 5000 nodes are averaged 
over 100 and 25 graphs respectively. 



in Ipf and V. From these data, we determine that there 
is only one "best" resolution for the defined system. See 
Appendix [E] for additional considerations in identifying 
the "best" benchmark resolution. 

In Fig. [TTJ we identify the "best" partition for a set 
of benchmark graphs over a range of the mixing param- 
eter 0.1 < fx < 0.7. We then compare each solution via 
NMI with the "known" partition. We average over 100 
graphs for N = 1000 and over 25 graphs for N = 5000 
for each tested \x. For comparison, we also include the 
results given in [53 ] for modularity optimization using a 
simulated annealing algorithm. Combined with the APM 
of Eq. ([pj), our multiresolution algorithm performs excel- 
lently, achieving almost perfect accuracy for each tested 
distribution exponent a and (5 and for a large range of 
the mixing parameter \x. The accuracy perturbations in 
panels (a) and (b) for N = 5000 nodes are due to bench- 
mark graphs with more than one local extremum in In 
and V. These perturbations are a result of the auto- 
mated selection of the single "best" resolution based on 
7/v and V extrema. We can largely eliminate them by a 
simple extension of the basic multiresolution algorithm 



(see Appendix [E|) . They are also nearly eliminated for 
these values of N if we specify the default community 
size constraints of n m i n = 20 and n max = 50. 

The absolute Potts model has little difficulty accu- 
rately solving the harder problem with N = 5000 nodes 
because the edges connected to external communities are 
spread over more communities on average. This con- 
struction causes a greater contrast of interior and ex- 
ternal edge densities (considering edges connecting pairs 
of communities). This larger contrast allows the bench- 
mark graph to be easily identified by the multiresolution 
algorithm. The converse occurs for small systems in the 
benchmark. 

Our multiresolution algorithm has some difficulty in 
identifying all communities in this benchmark for ex- 
ceptionally small systems (N < 300) where we achieve 
In — 1.0 for a range of /i that increases with N (for 
N = 300, In — 1.0 for \i < 0.45). Communities are 
partitioned locally, independent of any global parameters 
of the system; so this limitation is not a resolution limit 
effect. Rather, this behavior is due to simultaneously re- 
solving communities with substantially different relative 
densities 58 1. Palla et al. [l5[ stated that the community 
density should be used in identifying communities, which 
our Potts model does in effect. In Sec. IIIH we suggested 
that it is the typical community edge density that charac- 
terizes the resolution of a partition. The difficulty in this 
benchmark is due to defining communities by the fraction 
of each node's edges (1 — fi) that lie within its own com- 
munity. Each community contains l s — n s (k)(l — fi)/2 
edges on average where n s is the size of community s. 
The average edge density p s of community s is 



Vs 



(fc)(l-A0 
(n s -l) ' 



(12) 



The numerator is constant on average across all com- 
munities. Our Potts model solves hctcrogencously-sized 
systems well (see Sees. I VI Al and I VI C[) . but one notable 
implication of Eq. (fT2"|) is that the realistic distribution 
of community sizes leads to a substantial distribution 
of community edge densities with substantially different 
character for this benchmark. 

Note also that our highly accurate results for /i = 0.6 
and 0.65 for most values of N, a, and /3 in Fig. [TT] show 
that the concept of a weak community structure [5^ |. 
where some nodes have more total edges connected to 
other communities than within their own, is not too 
restrictive because the external edges can be dispersed 
among many other communities. Indeed for \i > 0.5, all 
clusters in this benchmark on average exceed the defini- 
tion of a weak community since most, if not all, nodes 
have more exterior than internal edges. So-called weak 
communities can occur frequently in social networks for 
example. Individuals often know far more people than 
the size of the local "community" group(s) (friends, as- 
sociates, etc.) of which they are members. We showed a 
similar, but more striking, result when identifying level 3 
of the constructed hierarchy in Figs.[lja) andfj] where the 
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smallest communities had many more external than in- 
ternal edges. Nevertheless, the model could easily resolve 
the communities at the correct resolution. 



VIII. DISCUSSION 

In Figs. [5] and US strong correlations in NMI 
and VI appear to be consistent indicators of important 
multircsolution structures. In most cases the assessments 
of the "best" partitions are confirmed by "plateaus" in 
the mutual information / and the Shannon entropy H. 
These information plateaus are similar to those seen in 
the number of clusters q in |36ll and that are also observed 
in our data [5l| . In Ref. [36|. the Arenas et al. indicated 
that plateaus in q correspond to the most relevant system 
structures. Our results largely affirm but also extend that 
observation. 

In many pertinent applications of our algorithm, the 
final results (including, by fiat, our synthetic networks in 
Sees. I VI Al and IVI C[) are indeed hierarchical in the con- 
ventional sense. That is, solving the Hamiltonian of Eq. 
© anew with a different model weight 7 may break the 
communities apart, but it does not swap vertices between 
different communities at the correct resolutions. As each 
resolution is solved independently in our algorithm, wc 
may (and indeed do) find more complicated multircso- 
lution partitions where node reassignments lead to over- 
laps between communities that are perhaps disjoint on 
another level. This latter case is more subtle and ap- 
pears in systems such as the dolphin social network of 
Sec. IVI Dl and other individually oriented networks. 

Variations in run time scaling among the different tests 
is influenced, sometimes strongly, by different levels of ef- 
fective noise in each system (aside from differing numbers 
of replicas and trials; see Appendix [B"| . For example, the 
hierarchy for Fig. [5] had a run time of 6.1 s. The corre- 
sponding random graph in Fig. [51 with nearly the exact 
same density and number of nodes, finished in 6.9 sec. 

NMI and VI possess different strengths for quantita- 
tively assessing multircsolution structure. (1) Of course, 
NMI is normalized and VI is not (although one normal- 
ization for VI is l/log 2 [HI). Both of these features 
are useful. (2) Figures [2HH show that VI more clearly 
identifies poor configurations. In the high density regime 
(7 5) of Figs. [2] and [U NMI shows a lower correla- 
tion compared to the peak values at (i) and (ii); but VI 
clearly indicates poor agreement. In Fig. [5J VI in panel 
(b) visually indicates a much poorer correlation in the 
7 ~ 0.3 region as compared to NMI in panel (a). (3) In 
Fig. [3K a), we identified peak (i&) as a "trivial" division 
with a huge component weakly connected to some small 
branch elements. If one was actually interested identify- 
ing these very low-density solutions, NMI does identify 
them. In panel (b), V and / simply indicate a very low- 
information configuration. 

In many cases, extrema in either NMI or VI are suf- 
ficient to identify the multircsolution structure of a sys- 



tem. Occasionally, wc need to additionally reference the 
mutual information / or the Shannon entropy H (or the 
number of clusters q [36|). For example, in Fig. [5] NMI 
and VI almost do not distinguish between the 7 = 0.83 
partition (the exactly correct one) and the 7 = 1.6 parti- 
tion (one weakly connected node separates to form a new 
community) because the separation between the two con- 
figurations is almost imperceptible. Both of these parti- 
tions correspond to level 3 of the hierarchy depicted in 
Fig. HJa), and both partitions have perfect correlations 
(In = 1 and V = 0). In this case, the small changes 
in information measures H and / indicate a redundant 
7 = 1.6 partition. Also in Figs. [TOl and [Til we used the 
plateau to distinguish, when needed, between strongly 
correlated transient partitions (due to random elements 
of the benchmark generation process) and the more sta- 
ble partition corresponding to the intended solution. 

A similar challenge can occur for very small systems, 
such as in the transition from (i) to (ii) in Fig. [5J or for 
systems with few intercommunity connections. As the 
resolution is adjusted in these systems, variability can 
be more limited; and system transitions can be sharply 
defined. For these systems, it is possible that the NMI 
and VI correlations can remain strong and constant while 
crossing a structural transition. In Fig. [51 we avoid this 
ambiguity by noting that H and / clearly show a tran- 
sition between structures (i) and (ii). Such systems can 
also accentuate the perceived plateaus in the multiresolu- 
tion data because the variation in different configurations 
is small and transitions between major configurations can 
be sharp. 

Given the distinctions, the two evaluations of multires- 
olution structure ("plateau" behavior in H, I, and q or 
strongly defined In and V correlations) are complimen- 
tary. While the plateau behavior is important, it is a 
more qualitative assessment of the "best" resolutions for 
the system. At least for our Potts model, under some 
conditions the plateaus in H , /, or q can be weak enough 
to prevent them being used as the universal indicator of 
multircsolution structure. In Fig. [51 the plateaus even 
corresponded to a set of similarly sized partitions with 
similar densities rather than consistent structure. The 
NMI and VI approach can more easily identify short- 
lived, but nevertheless strongly defined, structures (such 
as configuration (iv) in Fig. [5]) that the plateau criterion 
can miss. In all Figs. [2] - [H [5J [5J and [JD1 the major 
benefit of using the NMI and VI evaluations is that it 
appears to give a quantitative estimate of the "best" res- 
olutions. Together, the information measures appear to 
provide a consistent, accurate, and quantitative method 
of identifying general multiresolution structure. 

In further work, wc will also consider a different 
method of adjusting the resolution of the system using 
the Hamiltonian 

ftvtdv}) = ~ ^2 [ (ay + Oij)Aij - (bij + Pij) Jij] 

xS(a t ,aj) (13) 
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where oiij and are the new model weights as compared 
to 7 in Eq. @. This variable topology Potts Hamilto- 
nian is a generalized and continuous version of threshold 
cut-offs in weighted graphs. It presents an alternative 
method of continuously scaling the system by using an 
additive rather than a multiplicative scaling. It differs 
from Eq. © in that it progressively adjusts the topol- 
ogy of the system where multiplicative scaling does not 
change the system's connectedness. Additive scaling may 
provide a different perspective on the evolution of the 
system structure over different scales, and it may better 
simulate how some real world models are "stressed." 

Additionally, it may be possible to probe the system 
at a local level by using either localized partitions or by 
analyzing details within the confusion matrix at each res- 
olution. With this approach, we may be able to identify 
stable, but localized, structures beyond the information 
conveyed in the global information-based correlations. 

We discovered and will report in detail in an upcoming 
publication on a new sharp crossover between typical- 
easy and rare-hard community detection problems [46| . 
Our finding of a community detection transition consti- 
tutes an analog of the singular transition, or more pre- 
cisely, a singular re gion in the k-SAT (satisfiability) prob- 
lem. Mezard et al. [6Q[ found that the hardest problems 
occur along well-defined loci in the phase diagram of ran- 
dom satisfiability problems. These loci of hard problems 
separate the SAT region (of satisfiable random problems) 
and the overly constrained UNSAT region (in which the 
constraints cannot all be simultaneously satisfied). We 
ascertained a similar phenomenon within community de- 
tection. See Appendix [B] for a summary of one facet of 
this transition. 

Qualitatively, the analog of the SAT region is a com- 
mon "easy" and "fast" community detection region. A 
"transition" region, where computational cost rapidly in- 
creases and accuracy rapidly decreases, corresponds to 
the singular region of the k-SAT problem. A "hard" and 
"slow" community detection region corresponds to the 
UNSAT region of the k-SAT problem. For some commu- 
nity detection problems, the convergence rate can accel- 
erate in the hard region due to the problem being rapidly 
trapped by local energy minima. 

In a future work, we will detail the minimization of the 
"free energy" type functional of Eq. (fTTj) . This functional 
contains both the Potts model energy and the compos- 
ite information function. This latter information the- 
ory measure is maximized when the correlation between 
replicas is maximal. 



IX. CONCLUSION 

We use a Potts model measure for community de- 
tection and apply it to detecting multiresolution struc- 
tures: (1) Our approach identifies and quantitatively 
evaluates the 'best' multiresolution structure(s), or lack 
thereof, in a graph. (2) All resolutions are solved in- 



dependently, so the algorithm allows for the identifica- 
tion of completely general types of multiresolution struc- 
ture. (3) It is based on information comparisons, so in 
principle is should apply to any community detection 
model that can examine different resolutions. (4) The 
underlying Potts model and algorithm are as accurate as 
the best methods currently available (see Appendix |A|) . 
The model is a local measure of community structure, 
so it is free from the 'resolution limit' as discussed in 
the literature [12, EI EE El ES E3]- (5) Building on 
this foundation, the multiresolution algorithm demon- 
strates extremely high accuracy for large systems us- 
ing a recent benchmark proposed in [57[ (see Sec. IVII[) . 
(6) We estimate that the computational cost scales as 
0(N 1+l3 Z 1+l3 rt log N log Z) for some small (3 [H, S3 
where r is the number of replicas, t is the number of opti- 
mization trials per replica, Z is the average node degree, 
and N is the number of nodes. We have tested our com- 
munity detection algorithm on systems as large as O(10 7 ) 
nodes and O(10 9 ) edges (see Appendix ICj) [331 ]. The mul- 
tiresolution algorithm requires a substantial number of 
individual community solutions; but due to the speed 
of the underlying algorithm, it can nevertheless examine 
systems over O(10 5 ) nodes and O(10 7 ) edges on a single- 
user workstation. The algorithm should extend very effi- 
ciently to parallel or distributed computing methods al- 
lowing larger systems to be studied. 
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APPENDIX A: ACCURACY OF THE 
COMMUNITY DETECTION ALGORITHM 

We demonstrate the accuracy of the community detec- 
tion algorithm in Sec. IIVI that is used to calculate the 
individual replica solutions in Step [2] of the multiresolu- 
tion algorithm discussed in Sec.fV] Our results using this 
frequent model problem in the literature were previously 
presented in [l^]. The constructed model has 128 nodes 
divided into 4 clusters with 32 nodes each. For each node, 
Zi n edges are randomly connected to other nodes within 
its own community and Z out edges are randomly con- 
nected to nodes in one of the other three communities. 
The total degree of each node is Z = Z,- m + Z out where 
we require an average degree of Z = 16. 

The task is to verify the defined community structure. 
In Fig. [T21 we use 7 = 1 in Eq. © with q constrained 
to four. We plot the percentage of correctly identified 
nodes p versus the average number of externally con- 
nected edges per node Z out . We use the same measure 
of the "percentage" of correctly placed nodes as Ref. [19] 
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FIG. 12: (Color online) Reproduced from Ref. Q3]. A plot 
of the percentage of correctly identified nodes p versus Z ou t, 
the average number of edges that each node has connected to 
nodes outside of its own community. The average number of 
total edges per node is Z = 16. The benefit of extra trials t 
reaches a point of diminishing returns around t = 10 for many 
tests, and it is the intermediate difficulty trials (8 < Z ou t < 9) 
that benefit the most from the additional optimization trials. 
Note that the accuracy of our APM of Eq. ([9]) and algorithm 
in Sec. HVl is at least equal to the best algorithms. Each point 
is averaged over 500 systems. 
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FIG. 13: (Color online) A plot of the susceptibility \ n = 
p(t — n) — p(t = 4) versus Z ou t, the average number of edges 
that each node has connected exterior to its own community. 
Xn is the percentage increase in the accuracy of each test as 
the number of trials t = n is increased from n = 5 to n = 100. 
The average number of total edges per node is Z — 16. p 
is the percentage of correctly identified nodes from Fig. 1121 
The curves are spline fits and are intended for visualization 
purposes only. Additional trials are unnecessary in the easy 
region Z out < 7. The benefit of extra trials is largest in the 
short transition region 8 < Z ou t < 9. Afterwards, the benefit 
diminishes into the hard region Zout > 9.5 where the accuracy 
improvement is small even with a large number of attempted 
optimization trials. 



within jOJ • Four sets of data in Fig. [T^] were assimilated 
by Boccaletti et al. [22j . Simulated annealing proved 
to be the most accurate algorithm in 1221 although it is 
computationally expensive. Hastings [13j and Gudkov et 
al. [2l[ also demonstrated accurate results. 

Our results in Fig. [T2] use an older, slower version 
(without the neighbor-node search) of our algorithm. For 
a small system of only 128 nodes and q — 4 by constraint, 
the difference in run time would be small. For many of 
the tests, the benefit of extra trials t reaches a point of di- 
minishing returns by t = 10. High noise systems rapidly 
trap different replicas in local energy minima, so it is the 
"intermediate" difficulty solutions (8 < Z out < 9) that 
benefit the most from additional optimization trials. Our 
method maintains an accuracy rate at least equal to the 
best available algorithms. In particular, it maintains a 
95% or better accuracy rate up to Z out = 7.5. 



APPENDIX B: TRANSITION EFFECTS OF 
NOISE LEVEL ON COMMUNITY DETECTION 
ACCURACY 

The benchmark problem that serves as the basis for 
data in Fig. [T2] is discussed in detail in Appendix [XJ In 
Fig. Q21 we plot for several numbers of trials n, the "sus- 
ceptibility" Xn = p{t = n) — p(t = 4) versus Z out , the 



average number of edges that each node has connected 
exterior to its own community. The average number of 
total edges per node is Z = 16. p is the percentage of cor- 
rectly identified nodes from Fig. [T^] (see Ref. [19] in [6l|). 
and t is the number of trials at each test. The ordinate 
X in Fig. [T3] is the percentage improvement in accuracy 
based on the number of optimization trials that are used. 

As Z out increases, the noise in the system increases. 
Figure [13] illustrates how the noise in the system affects 
the effort required to solve the system as accurately as 
possible. The benefit of extra optimization trials is neg- 
ligible for the easy region up until about Z = 7. Addi- 
tional trials become more important for a short transi- 
tion region (8 < Z out < 9). Afterwards, the benefit of 
additional trials quickly reaches a point of diminishing 
returns in the hard region Z out > 9.5 where it fails to 
produce large improvements in accuracy despite signifi- 
cantly more computational effort. 

As the number of trials n increases, the "susceptibility" 
Xn progressively exhibits a more pronounced peak. Such 
a trend is also evidenced in the susceptibility of finite size 
physical systems. We have also identified a similar and 
related dynamic feature of the transition that is quan- 
tified by the increased computational time required for 
a single solution [4(| (beyond any added computational 
cost due to extra energy optimization trials). 
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APPENDIX C: COMMUNITY DETECTION OF A 
LARGE SYSTEM 

We tested our community detection algorithm in Sec. 
IIVI using the neighbor-node search on a synthetic net- 
work with over one billion links. We generated a random 
set of N = 40 million nodes separated into 1.25 mil- 
lion heterogeneously-sized communities with sizes rang- 
ing from 10 to 62 nodes. (Note that it is the number 
of edges that limits the calculation as opposed strictly 
to the number of nodes.) The random edge connection 
probability for the communities was p in = 0.9. Nodes 
between these communities were connected with a prob- 
ability of p out = 5.31 x 10~ 7 . Each node has an average 
number of interior and exterior edges of Zi„ ~ 29 and 
Z out ~ 21. The total number of edges was 1 000 211 862. 
The average density of the graph was p = 1.25 x 10~ 6 . 

We used 7 = 1 in Eq. (J9j) and the algorithm in Sec. 
IIVI to solve the system. There were 13 nodes that were 
not placed within their intended communities. These are 
likely due to random initialization fluctuations. The in- 
formation correlations for the "known" and solved an- 
swers were In — 1-00 and V = 1.85 x 10~ 6 with 
V m ax = l°g'2 N ~ 25.3. Both of these measures indi- 
cate very strong agreement. The total calculation time 
was 3.7 h on a single processor [33j |. 



APPENDIX D: GENERALIZATION OF THE 
INFORMATION-BASED REPLICA METHOD 

In Sec. El we may recast the information theory mea- 
sures used to evaluate the correlation between differ- 
ent replicas for other (non-graph theoretic) optimization 
problems with general Hamiltonians (or cost functions) 
H. An alternate form of Eq. for the mutual informa- 
tion between replicas i and j is 



I(i,j)=H(i)+H(j)-H(i,j) 



(Dl) 



where H(i), H(j), and H(i,j) denote the entropies of 
replica i, replica j, and the combined system formed by 
both replicas, respectively. Instead of using Eq. ([2]), we 
write the Shannon entropy H(i,j) for the combined repli- 
cas i and j which we then apply in Eq. (|D1|) . For general 
Hamiltonians H, we replace H(i), H(j), and H(i,j) by 
a thermodynamic entropy for the respective systems. 

In the general case, the thermodynamic entropy H(i,j) 
of the system formed by the union of replicas i and j is 

ff(i,j) = A j/3-i fog 



Tr^fe-^W+e-^) 



and the entropy H (i) of system i or j is 



ff(0 = A/^log 



(D2) 



(D3) 



T~C(i) and H(j) are the Hamiltonians of replicas i and j, 
and P = l/(Tln2) is the inverse temperature. Within 



our approach, an ensemble reduces to a finite number 
of points (replicas) whose correlations are monitored by 
information theory measures. This form pertains to the 
general case in which both i and j pertain to a collection 
of decoupled copies, and the traces are over all coordi- 
nates in replicas i and j . 

The standard mutual information of Eq. ^ is gen- 
erally not invariant (as it ideally should be) under the 
permutation of "identical" nodes (those with an identi- 
cal neighbor list that are otherwise indistinguishable by 
other parameters of the system). Specifically, we refer 
to nodes i and j as identical in a graph if the adjacency 
matrix A is invariant under the permutation of node i 
with node j [62| . That is, A commutes with the per- 
mutation of nodes i and j, [Pij,A] = 0, if nodes i and 
j are identical. The thermodynamic entropies of Eqs. 
(|D2[) and (|D3[) arc invariant under permutations of iden- 
tical nodes because any symmetries, or lack thereof, are 
fully represented in the system Hamiltonian H. 

In the simplest case with only one copy of the system 
in replica i and one copy in replica j, there is only one 
term in both i and j; and the designation Trjj becomes 
redundant (the entropies of i and j are also trivially 
H(i) = H(j) = 0). In a more realistic approximation 
to thermodynamic quantities, each of the replicas i and 
j contain a number of independent decoupled copies of 
the system. Inserting Eqs. (|D2|) and (|D5]) into Eq. (|DTj) . 
we obtain the mutual information between i and j. NMI 
and VI are then given by Eqs. and (H)). Other infor- 
mation measures S(i, j) between replicas i and j may also 
be computed. Along similar lines, multi-replica (higher 
than two) forms may replace the sum over two-replica 
configurations in Eqs. (fTTj) and (|E)2|) . 

We may also reconstruct the information measures us- 
ing a different physical analogy. The Shannon entropy 
of Eq. ([TJ) is analogous to an ensemble where each of 
the iV nodes corresponds to one point in the ensem- 
ble. The communities correspond to q possible states 
of a single particle with energies {Ek} for k = 1 to 
q at a given temperature T such that the same com- 
munity occupation probabilities are reproduced as pk = 
n-k/N = er^ Eh j Y11=i er ^ >Ei where the inverse temper- 
ature is (3 = l/(Tln2). The mutual information I of 
Eq. §2fy is equivalent to an ensemble of size N for a two- 
particle system in which each particle can be in any of 
q states. The interaction between the two particles is 
such that it leads to energies {Eij} for the two occu- 
pied communities i and j. These interactions lead to a 
relative probability pij = riij/N for occupying the two- 
particle states that is proportional to e~ l3Eij . The effec- 
tive Hamiltonian for the resulting physical system does 
not directly depend on the identities of the N nodes (al- 
though it does not distinguish between "identical" and 
distinguishable nodes) . 

One potential limitation of our thermodynamic frame- 
work in Eqs. (|D2[) and (|D3[) is that general, non-graph 
theoretic, applications may require many copies of the 
same system. The traces Tr^, Tvj need to be calculated 
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FIG. 14: (Color online) Plot of information measures In, V, 
H, and I in panels (a) and (b) vs the Potts model weight 
7 for a single realization of the benchmark suggested in [571 ] . 
The right-offset axes plot the number of clusters q. See Fig. 
[2] for a complete description of the legends and axes and Sec. 
I VIII for a full explanation of the benchmark. This example 
plot is a multiresolution analysis for N — 5000 at /j, — 0.45 
where 55% of each node's edges on average are connected to 
communities other than its own. The power-law distribution 
exponents for the node degrees and the community sizes are 
a = 2 and /3 = 1, respectively. We use the algorithm in 
Sec. [V] to attempt to identify the "best" resolution for the 
graph. For some cases in the benchmark, such as this graph 
at /i = 0.45, there exists more than one extremal value of In 
and V where the low-density configuration at (ia,b) also has 
slightly stronger NMI and VI correlations (SI N ~ 6.3 x 10 -5 ) 
than the intended benchmark answer at (iia,b). In this exam- 
ple case, a casual inspection indicates that the stable region at 
(tta,b) is clearly the "best" partition which also corresponds 
almost exactly to the intended benchmark solution. The au- 
tomated version of the algorithm favors the slightly stronger 
low-density configuration at (ia,b) as the "best" resolution 
for the graph. 



on multiple copies of the same system. This is bypassed 
in the application of mutual information for graph prob- 
lems because the node number N effectively plays the 
role of many ensemble points (multiple replica copies) on 
which the thermodynamic average is to be taken. 



APPENDIX E: MULTIRESOLUTION 
BENCHMARK COMMENTS 

As discussed in Sec . IVII1 we used the new benchmark 
problem presented in [57( to test the accuracy of our mul- 
tiresolution algorithm of Sec.[V] Our algorithm attempts 
to identify all strongly defined resolutions. By design, the 
benchmark in p57j constructs a "realistic" system with a 
single intended solution; however, random effects of the 
graph generation process can also create additional tran- 
sient, but nevertheless strongly defined resolutions which 
our algorithm can detect. In implementing the bench- 
mark, we endeavor to automate the identification process 
to determine the single "best" resolution as intended by 
the benchmark. We explain two special cases. 

The first difficulty is encountered for fi < 0.4. We 
can detect multiple resolutions with perfect correlations 
(In = 1 and V = 0) for resolutions near the intended 
benchmark solution which occur more frequently as fi 
decreases. This effect is similar to partition (i) that oc- 
curred near partition (ii) in Fig. [8] The transitional res- 
olutions are not necessarily meaningless partitions on an 
individual graph-by-graph basis, but they are artifacts of 
the randomly generated system and thus vary across the 
different benchmark graphs. Similar to structure (wa,b) 
in Fig. 1141 the plateaus in the information measures H 
or I (or the number of clusters q [36j ) indicate a more 
"stable" partition. It is this stable partition that corre- 
sponds to the intended solution for the benchmark graph. 
Thus, when necessary, we use the plateaus to discrimi- 
nate between the short-lived and the most stable strongly 
defined partitions in order to determine the single "best" 
resolution for each benchmark graph. 

A second difficulty is shown in Fig. Q3] which oc- 
curs most frequently in the range of mixing parameter 
0.45 < /a < 0.65. The stable configuration that cor- 
responds to the intended benchmark answer is configu- 
ration (na,b). The low-density, transient, but strongly 
correlated configuration at (ia,b) has a slightly higher 
NMI correlation. Even a casual visual inspection of the 
data in Fig. [HI indicates that configuration (iza,b) is the 
dominant configuration for the graph. Specifically, con- 
figuration (wa,b) possesses both very strong NMI and VI 
correlations (In — 1-0 and V ~ 0.0) as well as stable 
and long H , I, and q plateaus, and indeed it corresponds 
almost exactly to the intended benchmark answer. How- 
ever, the automated application of the multiresolution al- 
gorithm slightly favors configuration (za,b) as the "best" 
resolution since it has a higher NMI (5In — 6.3 x 10~ 5 ) 
and a lower VI. (See Sees. IVIBI and IVIIII regarding po- 
tential problems of using the plateaus in H, I, or q as the 
primary measure for identifying the "best" resolutions.) 

These graphs are the cause of the accuracy perturba- 
tions in Figs. ITTT a) and fTl7h) . They are less frequent 
for (5 = 2 since the community size distribution is more 
skewed towards smaller communities than for (5 = 1. We 
note that the average accuracy for the perturbations in 
Figs. HHa) and \TV[b) is still high at I N ~ 0.96. In 
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Fig. [TT] an iteration cap acted as an effective filter for 
most low-density spikes. We could further improve the 
automated analyses of such graphs by replacing this filter 
with moving NMI or VI averages (i.e., each moving aver- 



age is over the NMI or VI of several nearby resolutions) to 
exclude resolutions such as the short-lived configuration 
(i). 
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